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ABSTRACT 

We discuss some practical aspects of measuring the variability amplitude of faint and distant active 
galactic nuclei (AGN), characterized by sparsely sampled lightcurves and low statistic. In such cases 
the excess variance, commonly used to estimate the intrinsic lightcurve variance, is affected by strong 
biases and uncertainties since it represents a maximum likelihood variability estimator only for iden- 
tical/normal distributed measurements errors and uniform sampling. We performed realistic Monte 
Carlo simulations of AGN lightcurves, reproducing both the sampling pattern and measurement er- 
rors typical of multi-epoch deep surveys, such as the XMM-Newton observations of the Chandra Deep 
Field South (CDFS), or assuming different sampling patterns that may characterize long surveys with 
sub-optimal observing conditions. We used the results to estimate our ability to measure the intrinsic 
source variability as well as to constrain the observing strategy of future X-ray missions studying 
distant and/or faint AGN populations. 

Subject headings: galaxies: active - sample text - user guide 



1. INTRODUCTION 

Active Galactic Nuclei (AGN) are characterized by 
large amplitude and rapid variability, especially in the 
X-ray band, which is probably originating in the inner 
regions of the accretion disk and the hot corona. One 
of the most common tools for examining AGN variabil- 
ity is the Power Spectral Density Function (PSD). Early 
attempts to measure the AGN X-ray PSDs showed that 
they have a p ower- law like shape with a slope o f ~ 1.5 
(jGreen et alj|1993t lLawrence fc Papadakis! 11993ft . This 
result is indicative of a scale-invariant red-noise process, 
on timescales ranging from a few hours to years, with no 
evidence of periodicities. 

In recent years it has become increasingly clear that 
there exists at least one characteristic timescale in 
the AGN X-ray PSDs. This timescale reveals it- 
self in the form of "frequency breaks" (z^br) in the 
PSD, where the slope changes from a value of ~ —1 
below the "break ", to ~ —2 at frequencies higher 
than t^br (see e.g. lUttlev. McHardv fc Papadakis! [20021 : 



. tnan t^r (see e.g. lUtti 
i"H ■ iMarkowitz et al l I2003H . 



In at least one case, namely 
Ark 564, a second break, where the slope changes from 
~ —1 to zero, is als o detected (Papada kis et al.l 1*2002: 
. 5~ ■ iMcHardy et alJfeOOTt ). These time scales may be linked 
to the characteristic disk time scales like the dynamical, 
thermal or viscuous timescale, and app ear to correlate 
with the BH mass and ac cretion rate (McHard v et al.l 
120061: iKoerding et aJJ 120071) . Thus variability measure- 
ments represent a tool to investigate both the physics of 
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the accretion process, as well as the fundamental param- 
eters {Mbh, fh) of the active nucleus. 

So far, our knowledge of the X-ray variability proper- 
ties of AGNs is mainly based on the study of a few nearby, 
X-ray bright objects, which have been monitored exten- 
sively with RXTE over many years, and for which there 
also exist day-long, high signal-to- noise (S/N) XMM- 
Newton light curv es. At the same tim e , deep multi- 
cycle surveys (e.g. lAlexander et al.l 120031: IBrunner et all 
20081: iComastri et al.l 120111: Kue et al.l 120111: also see 



Brandt fc Hasingerl l2005l ~and references therein), have 



been accumulating observations of intermediate and high 
(z > 0.5) redshift AGN, thus offering the opportunity to 
explore AGN variability at high redshift as well. How- 
ever, due to the sparse sampling, and the low flux of 
most AGN detected in these surveys, it is not pos- 
sible to use PSD techniques to study the variability 
properties of these objects. For that re ason, a differ- 
ent s t atistic, namely the excess variance (|Nandra et al.l 
\199H iTurner et all EMI lEdelson et all I2002D has been 
used to parame trize the variability properties of the high 
redshift AGN (lAlmain i et al.l 12000b iPaomlo etaLl 12004 
iPapadakis et al.ll2008l) .~ 

Strictly speaking, the excess variance is a maximum 
likelihood estimator of the intrinsic light curve variance 
only in the case of uniform sampling an d identical and 
norm ally distributed measurement errors ([Almaini et al.l 
2000). A detailed discussion of the statistical proper- 
ties of the excess variance and its performance in the 
case of red noise PSDs of various slopes and " break" fre- 
quencies, and o f diffe rent S/N ratios, can be found in 
IVaughan et al.l (|2003D . These authors however consid- 
ered the case of continuously sampled data only, such 
as those provided by long XMM observations of nearby 
AGNs. Instead, in deep multi-cycle surveys, the effects 
of sparse and uneven sampling must be taken taken into 
account when investigating the statistical properties of 
the excess variance. 

The goal of this work is to investigate the performance 
of the excess variance as a measure of the intrinsic AGN 
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variability. In particular, we consider sources similar to 
those observed in multi-epoch surveys and characterized 
by extreme sparsity, due to the observing strategy and 
orbital visibility of the targets. We measure the bias 
and the expected scatter of the excess variance measure- 
ments, and we investigate the dependence of the bias 
on the sampling pattern and gap length, as well as on 
the S/N ratio of the light curve. We believe that our 
results will be useful to researchers who wish to study 
the variability properties of high redshift AGN (an area 
which is still largely unexplored), as well as to under- 
stand the possible limitations of the existing data, and 
to correct (in a statistical sense) for some effects that 
the uneven sampling introduces in the estimation of the 
intrinsic variability. Finally, our results could be of use 
in the determination of the optimal observing strategy 
either for future surveys with the current X-ray satellites 
or for future X-ray missions. 

The paper is organized as follows: in Sj3]we define the 
variability estimator; in Sj3]we describe the Monte Carlo 
simulations of AGN lightcurves both reproducing the 
pattern of the XMM-Newton observations of the CDFS 
and further testing more favorable observing strategies. 
The applications to future X-ray surveys are presented 
in $3] while our results are discussed in 

2. NORMALIZED EXCESS VARIANCE 

The variability of accreting systems is usually investi- 
gated through the use of the PSD, which gives the light 
curve variance per Hz at each temporal frequency. AGN 
exhibit a power-law PSD, as S(f) oc /~' 3 , where S(f) 
is the power at frequency /, with slopes usually in the 
range 1 < j3 < 2. A proper derivation of the PSD (or 
of the lightcurve variance, see below) is intrinsically dif- 
ficult: extrapolations from any single realization can be 
misleading due to the stocha stic nature of any red -noise 
lightcurve (see discussion in Vaug han et al.l 12003) . For 
real data this task is further affected by the signal-to- 
noise ratio of the data, the finite length of the observation 
and by the sampling pattern. 

The analysis of present day light curves of distant 
AGNs is difficult since these sources are usually serendip- 
itously detected in deep surveys. As a result, their light 
curves are characterized by low signal-to-noise ratio as 
well as sparse sampling. In such cases, instead of trying 
to derive the PSD, it is easier (and often only possible) 
to estimate the total light curve varianc e using the so 
called excess variance, which is defined as (jNandra et al.1 
fl99l : 

1 r 

&NXS = ^=2 K Xi ~ W ) 2 - a lrr,i\ > (!) 

2 — 1 

where Xi and 

^ ' G.w % are the count rate and its error in 
i-th bin, x is the mean count rate , and N is the number 
of bins used to estimate <jff XS . With this normalization 
we are able to compare excess variance estimates derived 
from different segments of a particular lightcurve or from 
lightcurves of different sources. The statistic c^^s * s an 
estimate of the (squared) fraction of the total flux per bin 
that is variable, corrected for the experimental noise. Ac- 
cording to the Parseval theorem, the contribution to the 
intrinsic variance due to variations between the shortest 



and longest time scales sampled, which o~ 2 NXS measures, 
should be roughly equal to the integral of the intrinsic 
PSD between the shortest and longest frequencies sam- 

pled ' 

The erroiQ on cr 2 NXS , asymptotically for large N, is 
given by the variance of the quantity (x{ — x) 2 — a 2 rr i} 
i.e. 

Aa 2 NXS = S D /[x 2 (N)V% (2) 
1 N 

i=l 

As mentioned earlier, the performance of the excess 
variance, under various intrinsic PSD models, in the case 
of evenly sampled light curves has already been investi- 
gated by Vaughan et al. (2003); here we intend to ex- 
plore instead the performance in case of sparse sampling 
and low S/N. We also point out that if each point in 
the lightcurve has equal weight, then cr% X g 1S indeed 
a maximum-likelihood estimator of the lightcurve vari- 
ance. This is not true anymore in cases where the errors 
differ significantly from point to point, and a numeri- 
cal approach is needed in order to obtain the maximum- 
li kelihood estimate of the intrinsic variance (see details 
in lAlmaini et al.ll2000f ) . We will explore this case as well 
in the following sections. 

3. MONTE CARLO SIMULATIONS OF AGN 
LIGHTCURVES 

3.1. The algorithm and the simulated CDFS light 
curves 

In order to quantify the bias and the uncertainty of 
the excess variance as an estimator of the intrinsic source 
variance in the case of very unevenly sampled light curves 
of faint sources, we performe d Monte Carlo simulation s 
modifying the original code of lTimmer fc Koenid (|1995() , 
that generates red-noise data with a power law PSD, in 
order to reproduce the real data extraction process in- 
cluding filtering and background subtraction. We simu- 
lated, for each AGN, the actual lightcurve measurement: 
we first create an intrinsic AGN lightcurve with the above 
algorithm, following the appropriate PSD. Then we add 
to the AGN count rate, in each time bin, the contribution 
from the expected background, randomly adding Poisson 
fluctuations to both terms. A second local background 
estimate is also generated (including again Poisson fluc- 
tuations), and then subtracted from the AGN, as done 
in real data. 

In order to account for the effect of red noise leak, 
which transfers power from low to high frequencies, we 
generate lightcurves which are 5 times longer than the 
largest timescale sampled by the data, and extract a seg- 
ment of the required length. We verified that extending 
the simulated lightcurves further does not significantly 
changes our results, while increasing considerably the 
processing time. 

7 Note there was a typographical error in[Wandra et al. ( 1991]), i n 
that the equation for the error on cr%vg should have had the quan- 
tity in side the rm s summation squared , as cl arified bvlTurn er et aO 
119991) . Also see lEdelson et al.l (2002) and Vaughan et al.1 020031) 
for alternative expressions and a discussions of the different formu- 
lae. 
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Mean Count Rate =0. 1 1 8±0.0 1 1 
Excess Variance =0.007±0.002 

Mean Count Rate =0.099±0.020 
Excess Variance =0.041 ±0.001 
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Fig. 1. — Simulated AGN lightcurve according to the input parameters in Table[T] reproducing a continuous sampling (black crosses) and 
the sampling pattern of the XMM-Newton observations of the CDFS (red circles). Mean count rate and excess variance estimates refer to 
the particular simulation extracted from a set of 5000 simulations. 



In order for our experiment to be as close as possible 
to reality, we performed Monte Carlo simulation of AGN 
lightcurves assuming the sampling pattern and uncer- 
tainties of the XMM-Newton observations of the CDFS. 
In particular we take into account only the first 1 Ms 
observations, taken between 2001 and 2002, to study a 
worst-case scenario before discussing, in the next sec- 
tions, more favourable ones. As a starting point, we sim- 
ulated red-noise lightcurves with intrinsic count rate and 
variance of one of the brightest AGN obs erved by XMM- 
Newt on in the CDFS (source id 68 from lGiacconi et al.l 
2002) at z ~ 0.54, using as input the set of parame- 
ters reported in Table [1] The source has a soft (and 
hard) flux of ~ 5 x 10~ 14 erg/s/cm 2 , i.e. we expect 
10-2 of these sources pe r square de g ree, a ccording to, 
e.g. iHasinger et"afl (|1993l ): iLuo et al.l (|2008f >. Compared 
to other bright AGNs in the field, this source has the 
advantage of being fairly isolated and thus its flux and 
variability can be robustly estimated. We explore PSD 
slopes ranging from 1 to 3; in the following we are going 
to show the results for simulations with /? = 1.5, but we 
will discuss the results in all the other cases as well. 

Fig. [1] shows an example of a simulated lightcurve: the 
red points highlight the sampling pattern of the XMM- 
Newton observations, compared to the whole underly- 
ing lightcurve. The first group of points corresponds to 
the two observations of July 2001 with an effective ex- 
posure, after filtering high background periods, of ~ 80 
ks and the second (with more data points) to the six 
observations of January 2002 for an additional 900 ks. 
The whole simulated lightcurve with continuous sam- 
pling (black crosses) thus spans ~ 1.5 x 10 7 sec, i.e. about 
6 months, out of which the actual XMM-CDFS observa- 
tions (red circles) sample ~ 9.8 x 10 5 sec (~ 11 days). 
This type of observing pattern is driven primarily by the 
typical scheduling requirements of deep multi-cycle cam- 
paigns, and thus represents a recurring, although unde- 
sirable, observing scheme which has been the only avail- 
able to astronomers until the 2 009 extended XMM o b- 
servmg campaign of the CDFS (IComastri et alllMtl . 



TABLE 1 

Input parameters of the simulated AGN lightcurves 



Power-law PSD index (/3) 1,1.5,2,2.5,3 

Number of simulations (N) 5000 

Mean count rate 0.1 cnt/s 

Time resolution (Ai) 10 ks 

Intrinsic lightcurve variance (<x?„) 0.042 

Background level 0.06 cnt/s 



The figure also reports the mean count rate and excess 
variance measured over the whole lightcurve and over 
the intervals sampled by the XMM observations, for this 
specific realization. As discussed in more detail below, 
when sampling the whole lightcurve the measured val- 
ues reproduce the input parameters, while in the case of 
sparse sampling we obtain biased results. 

3.2. The distribution of <r 2 NXS in the case of sparsely 
sampled light curves 

In Fig. [5]we present the excess variance distribution of 
a set of 5000 simulations of sparsely sampled lightcurves 
such as the one shown in Fig(I] for the case of an in- 
trinsic PSD with power-law slope j3 — 1.5 (solid line). 
The dashed line in the same figure represents instead 
the distribution of the maximum like l ihood variance es- 
timator as proposed bv Almaini et al.l (|2000f ). The verti- 
cal dot dashed line in Fig [2] marks the intrinsic variance 
(intrinsic = «-042, i.e. 20.5% r.m.s.). 

Although the errors on each point of the lightcurve 
are not identical, the sample distribution of the vari- 
ance measured thr ough the numerical estimate of 
lAlmaini et al.l ()2000D does not differ much from the dis- 
tribution of the excess variance, at such count rate lev- 
els. Both distributions in fact are highly peaked at values 
smaller than the intrinsic source variance. The median 
value of the o~ NX s distribution is listed in Table[5]for sim- 
ulations with (1) continuous sampling, (2) sparse sam- 
pling, (3) sparse sampling using the maximum-likelihood 
estimator and (4) correcting for the true mean count rate. 
The lower and upper quartiles of the distribution within 
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Sporsely sompled excess variance dlstr 
Maximum Likelihood approach 
Mean-corrected excess variance dlstr 
Expected Median Value 
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Fig. 2. — Excess variance distribution based on a set of 5000 
simulated AGN lightcurves, such as the one shown in Fig[T] re- 
producing the sampling pattern of the XMM-Newton observation 
of the CDFS (solid black line), compared to the expected input 
value (vertical red line). The dotted red line represents the same 
distribution corrected for the true intrinsic mean count rate (see 
discussion in the text), while the maximum likelihood approach 
is shown by the dashed blue line. The o-j^^g distribution for a 
continuous sampled lightcurve is not shown here since it is an ex- 
tremely narrow distribution peaked on the intrinsic value of the 
variance. 

90% are in brackets. Both the maximum likelihood and 
a NXS are thus "biased" estimators of the intrinsic source 
variance. In addition, both distributions are very broad, 
and highly skewed towards large positive values. Clearly, 
an individual measurement of neither o 2 NXS nor its maxi- 
mum likelihood equivalent, can be considered as a reliable 
estimate of the intrinsic source variance. We also note 
that using Eq[2] the median error on crfj XS is equal to 
~ 0.006 , i.e. the formal error tends to underestimate 
the true scatter and does not account for the asymme- 
try of the distribution, as it does not include the effect 
of the sparse sampling. As shown in Table [2j in case of 
continuous sampled AGN lightcurve, as expected the dis- 
tribution of o- 2 NXS is quite narrow and strongly peaked 
to the intrinsic source variance. Very similar results are 
found also assuming different index /3 of the power-law 
PSD. 

We conclude that the sparse sampling does indeed 
results in a biased distribution of the excess variance, 
and increases the " uncertainty" on each individual value. 
Using a sparse sampling pattern the variance of the 
lightcurve is underestimated, mainly because each real- 
ization badly reproduces the intrinsic mean count rate; 
in fact the value derived from the sparsely sampled data 
will always be closer to the sampled points than the true 
mean, thus minimizing the variance^. To demonstrate 
this point, we fixed the average count rate x in Eq. (ffj 
to its intrinsic value, finding that in such case the mean 
output variance approaches on average the input value 
(see Tableland dotted red line in Fig. [5]), while still 
retaining the large scatter. 

3.3. The o-%xs ^ as 



8 Note that while the mean count rate that we measure for each 
individual realization of the sparsely sampled lightcurves is biased, 
and thus minimizes the variance, its distribution over the entire set 
of 5000 simulations peaks at the expected input value. 




Fig. 3. — Bias distribution based on a set of 5000 simulated 
AGN lightcurves such as shown in fig. [l] reproducing the sampling 
pattern of the XMM-Newton observation of the CDFS. 

The numerical experiment we discussed above can be 
used in principle to correct the measured variances, in 
order to retrieve the true intrinsic value. To this end, for 
each of the 5000 simulated lightcurves we computed the 
ratio between the intrinsic variance <j\ n and the actual 
excess variance measured for the particular simulated 
lightcurve o 2 im . The sample distribution of this ratios 
is plotted in Fig. |3] This distribution has a large scat- 
ter and it is highly asymmetric, due to the large scatter 
and highly skewed nature of the a 2 sim distribution itself. 
For about < 25% of the simulated light curves this ra- 
tio is < 1, but for the majority of them it is > 1. We 
can then define the "median bias" of the estimated ex- 
cess variance, which in essence indicates the correction 
factor that is needed to retrieve the intrinsic variance, as 
follows: 



where m^d(a 2 im ) is the median of the o 2 im distri- 
bution. This defini tion of the bias is similar to the 
lAlmaini et al.l (|2000[ ) definition, although in the lat- 
ter case, the authors defined the bias using the stan- 
dard deviation instead of the variance of the lightcurve. 
They used an average correction factor using lightcurves 
spread over periods from 2 to 14 days, in the range 1- 
1.34 with the largest values for the faintest QSO with 
only two widely spaced temporal bins. In the case of the 
sampling pattern reproducing the XMM-Newton obser- 
vations of the bright source n.68, we derived b = 1.8 for 
j8 = 1.5. We also calculated the bias values for different 
intrinsic power spectra, finding that the bias changes for 
different slopes but in all the cases the intrinsic variance 
is < 2 times larger than the median excess variance o 2 im 
that we measure from the sparse lightcurves (see Table 
0). 

If the bias was know a priori, it could be used to rescale 
the measured excess variance, and correct for the ef- 
fects of both the red noise leak and sampling pattern 
in the measurement of this quantity. However, the bias 
of the individual realizations has a large scatter due to 
the large and strongly asymmetric cr 2 im distribution, an 
effect which has not been properly considered in previ- 



X-ray Variability of AGN 



5 



TABLE 2 

Median ct% xv and bias for continuous and sparse sampling 



/3 Continuous Sparse Max. likelihood Mean Corr. b 
(1) (2) (3) (4) (5) 

1 0.0418(0.0415,0.0420) 0.030(0.004,0.043) 0.017(0.005,0.036) 0.040(0.015,0.050) 1.4(1. ,10.4) 
1.5 0.0418(0.0415,0.0420) 0.023(0.007,0.035) 0.015(0.008,0.035) 0.041(0.021,0.070) 1.8(1.2,6.0) 

2 0.0418(0.0415,0.0420) 0.025(0.005,0.042) 0.016(0.007,0.043) 0.052(0.022,0.078) 1.7(1. ,8.36) 
2.5 0.0418(0.0415,0.0420) 0.032(0.008,0.042) 0.018(0.010,0.052) 0.064(0.030,0.110) 1.3(0.9,5.2) 

3 0.0418(0.0415,0.0420) 0.037(0.011,0.061) 0.020(0.012,0.054) 0.070(0.033,0.150) 1.1(0.7,3.8) 



TABLE 3 



Median 


2 

a NXV 


AND bias as a function of S/N ratio for p = 1.5 


Mcr 


N 


Source Flux 


Median a% x s 


b 


cnt/s (cnt/bin) 


(erg s -1 cm -2 ) 




0.1 (1000) 


25 


6.25 xlO -13 


0.022(0.007, 0.034) 


1.9(1.2,6) 


0.05 (500) 


22.6 


3.12 xl0~ 13 


0.022(0.007, 0.034) 


1.9(1.2,6) 


0.01 (100) 


6.3 


6.25 xl0~ 14 


0.022(0.005,0.036) 


1.9(0.007, 10.2) 


0.005 (50) 


3.4 


3.12 xl0~ 14 


0.021(0.002,0.038) 


2(1,21) 


0.002 (20) 


1.4 


1.25 xl0~ 14 


0.016(-0.54, 0.66) 


2.52(0.06,oo) 


0.001 (10) 


0.8 


6.25 xlO" 15 


< 





TABLE 4 

Median ct% yv and bias as a function of gap length 



Temporal Gap 


Median &%xs 


b 


(days) 






5.8 


0.039(0.027,0.050) 


1.02(0.74,1.48) 


11.6 


0.036(0.022,0.050) 


1.12(0.80,1.82) 


28.9 


0.030(0.016,0.050) 


1.32(0.80,2.54) 


57.9 


0.027(0.13,0.52) 


1.48(0.76, 3.07) 


115.7 


0.024(0.10,0.50) 


1.62(0.80, 4.03) 


231.5 


0.021(0.07, 0.47) 


1.90(0.85, 5.71) 



ous works. The bias factors shown in Tableware median 
values over 5000 simulations while the individual excess 
variances can differ much more from the intrinsic vari- 
ance. Therefore, given the large and skewed distribution 
shown in Fig. [2j the bias on individual lightcurve can be 
2-3 times higher than the one estimated using Eq. [3] and 
the extreme care must be employed when inferring the 
variability parameters from single observations of AGN 
with such extreme sampling patterns. 

4. WHAT AFFECTS THE OBSERVED 
VARIABILITY BIAS? 

4.1. Bias Dependence on the source flux 

As discussed in the previous section, strongly unevenly 
sampled lightcurve produces a biased estimate of the 
intrinsic lightcurve variance. Such bias derives mainly 
from the inability of our data to constrain the average 
source flux due to the red noise character of the AGN 
PSD, which implies larger power at lower frequencies. 
More importantly, a sparse sampling produces a wide ex- 
cess variance distribution, indicating that each individual 
measurement could differ significantly from the intrinsic 
variance, even if an average correction is applied to our 
measurement. 

Obviously we expect a dependence of the bias and its 
scatter on the source flux as a result of the white noise 
introduced by Poisson fluctuations. To estimate such 



effects, we simulated lightcurves assuming different av- 
erage count rates, corresponding to fluxes smaller than 
the one of the source 68, as is the case for the bulk of 
the AGN population detected in the CDFS. Table l3~2l 
shows the bias dependence on the source flux, in case 
of the XMM-CDFS observation pattern, fixing the PSD 
slope f3 = 1.5. The excess variance is the median of the 
distribution based on a set 5000 simulations while the 
bias is estimated using Eq. [3] with the errors coming 
from the 90% upper and lower quartiles of the excess 
variance distribution. Conversion factors from counts to 
fluxes were calculated assuming a power law spectrum 
with a p h = 1-4 and nn = 8 x 10 19 cm~ 2 . 
The excess variance estimates and bias factor do not 
change significantly with the source flux down to count 
rates of ~ 0.005 cnt/s (which correspond to a S/N ra- 
tio per bin of 3.4 given the assumed XMM background), 
while the width of the excess variance distribution in- 
creases. At lower S/N levels the bias increases up to a 
point where we are not able to detect variability any- 
more, since the excess variance distribution is wide and 
the median value becomes negative. We verified that the 
results do not depend on the specific value of /3 that we 
use. 

This result suggests that a minimum S/N ratio per 
bin > 1.5 — 2 is advisable for estimating the intrinsic 
excess variance in case of sparse sampled ligthcurves. 
Moreover we verified that the same bias is observed 
wh en using the maximu m-likelihood approach proposed 
by lAlmaini et all (|2000D . for all the considered S/N ra- 
tios. Note that in the low count regime the Almaini 
approach cannot predict a negative intrinsic variance by 
construction and thus yields a ML value of 0. 

4.2. Bias Dependence on the Gap Length 

Apart from such dependence on sampling pattern and 
source flux, we expect that the bias will change as a func- 
tion of the gap length. To test this effect on the intrinsic 
variance estimator, we simulated AGN lightcurves with 
the total exposure time of the XMM-CDFS observations 
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(440 ks) sampled by two blocks of observations of 220 ks 
each, using the input parameters shown in Table [TJ The 
temporal gap between the observations ranges from the 
extreme case of ~ 7 months, similar to the gap in the 
XMM-CDFS pattern, to the more favourable case of ~ 6 
days. Table |4] summarizes our results, where the excess 
variance and bias estimates are as before the median of 
the distribution based on a set of 5000 lightcurves simu- 
lations, fixing j3 — 1.5, while the bias is estimated using 
Eq. Hand the errors are derived from the quartiles errors 
of the cr NXS distribution. 

As expected both the median bias and the width of the 
excess variance distribution increase with increasing gap 
length; the same trend is observed assuming different 
power-law slope values. 

Thus again, as discussed in $3] for the XMM sampling 
pattern, because of the large uncertainties associated to 
the excess variance estimate, each individual lightcurve 
measurement yields an extremely poor estimate of the in- 
trinsic source variability, and such uncertainties increases 
as a function of the gap length, as shown by the errors 
in Table |4] In such cases the only way to make a more 
robust estimate is to collect repeated observations of the 
same source, in order to lower the statistical uncertainties 
(assuming that the process producing the variability is 
stationary). Alternatively large samples of sources may 
provide a less biased ensemble estimate, assuming that 
the underlying PSD is similar for all sources. 

4.3. Ensemble Excess Variance Estimate 

A collection of several observations of the same source 
or a large sample of AGN may produce a less biased es- 
timate of the AGN variability under some particular as- 
sumptions (stationary variability process or same PSD 
for all AGN). In order to verify how reliably we can 
constrain the source variance through repeated/multiple 
observations, we binned the 5000 simulated excess vari- 
anc es o btained by using the XMM pattern as described 
in <0TTJ in groups of 5, 10, 20, 50-points. For each bin 
we estimated the mean excess variance and its standard 
deviation. The distributions of the 5, 10, 20 and 50- 
points binned mean-a^ xv and of its standard deviation 
are shown in Fig. [4] for a count rate of 0.1 cnt/s and 
P = 1.5. 

The resulting m&an-o'jy xv distributions do not peak 
on the intrinsic variance, as the individual realizations 
are anyway biased due to the sparse sampling. However, 
these distributions are now more symmetric and roughly 
Gaussian. A Kolmogorov-Smirnov test performed on the 
5, 10, 20 and 50-points mean-cr^ X y distributions indi- 
cates that only for the 5-points grouping we can reject 
the hypothesis of Gaussian distribution at > 95% level. 
Furthermore if we compare the standard of the binned 
distributions in the upper panel of Fig. 4 (whose values 
are shown in the inset as the errors) to the scatter of the 
individual realizations in each of the n=5,10,20 and 50 
points bins, we find that such scatter (divided by \Jn — 1) 
is on average representative of the uncertainty on the 
binned m.ean.-a NXV ; in fact the error in the upper panel 
is equal to the mean value of the distribution in the lower 
panel (due to the central limit theorem). In practice this 
means that when binning our data, we can estimate the 
uncertainty on each mean.-a NXV simply from the scatter 
of the individual points composing each bin. 
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Fig. 4. — Upper panel: 
mated by binning 5000 si mula ted excess variance (adopting the 
XMM sampling pattern of i]4.1jl , in groups of 5, 10 20 and 50 points 
(according to the legend). The inset shows the median values of 
the binned distributions and their st.dev. The simulations are per- 
formed by assuming a count rate of 0.1 cnt/s and /3 = 1.5. Lower 
panel: Distribution of the errors on the mean-cr^^-^ estimated as 
the st.dev. of the points within each bin, divided by -Jn — 1. The 
inset reports the mean values of the distribution for the different 
binning. 



According to the results described in ij4.1[ we expect 
that the spread of the distributions of mean excess vari- 
ances increases with the decreasing source flux. In fact, 
down to count rates of ~ 0.005 cnt/s (S/N=3.4), the 
10, 20 and 50-points mean-a"^ xv distributions are still 
Gaussian but the errors rise such as the discrepancy be- 
tween the median values of the mean-a^xv distributions 
and the intrinsic variance. We verified that this trend 
does not depend on j3. These results imply that if one 
bins together 10, 20 or 50 excess variances estimated for a 
moderately bright AGN sample, then the corresponding 
binned mean <7 NXV is roughly a Gaussian variable, and 
the associated uncertainty is equal to the scatter of the 
individual binned <J NXV , divided by ^/n — 1, irrespective 
of (3. However at low fluxes (count rates < 0.002 cnt/s, 
S/N<1.4), the errors become dominant and the scatter 
on the mean excess variance is > 100% (see Fig. [5]). 

Similarly we expect a dependence of the average excess 
variances on gap length. To test such effect we applied 
the same binning method on 5000 simulated excess vari- 
ances obtained by using the sampling pattern described 



X-ray Variability of AGN 



7 



5 points-bin 
I 10 points-bin 
1.20 points-bin 
50 points-bin 



Medion = 0.02 ±0.1 9 

Medion = 0.02 ±0.1 3 

Medion = 0.02 ±0.09 

Medion = 0.02 ±0.06 




Mean Count Rote = 0.077±0.013 
Excess Variance =0.025±0.005 

Mean Count Rate =0.097±0.024 
Excess Variance = 0.059±0.012 




Time (in seconds) 



n 



Ln 



5 polnts-dlstr: Mean = 0.17 

10 points-distr: Mean = 0.13 

20 points-distr: Mean = 0.09 

50 points-distr: Mean = 0.06 



1J1 




Fig. 5. — As figure 4 but for 0.001 cnt/s 



in 14.21 For temporal gaps below ~ 58 days, the mcan- 
<Jpfxv distributions obtained with the 5, 10, 20 and 50- 
points binning are Gaussian, while increasing the gap 
length up to ~ 7 months, the 5-points mean excess vari- 
ance distribution becomes not Gaussian at > 95%. As 
before the means of these distributions do not peat at 
the input variances (e.g. they are biased) and the dis- 
crepancy respect to the intrinsic variance increases with 
the temporal gap, as does the uncertainty on the mean 
values of the distributions. 

4.4. Uniform and Progressive Sampling 

In order to test more favourable scenarios, better suited 
to reduce the bias in AGN variability estimates, we gen- 
erated two additional sets of lightcurves with the input 
parameters shown in Table[T] and adopting different sam- 
pling patterns, which span the same maximum timescale 
as the XMM observations described in £j3} 

1. Uniform sampling, consisting in 9 observations of 
50 ks each separated by constant temporal gaps of 
1900 ks (~ 20 days, fig. [51 upper panel); 

2. Progressive sampling, where the observations are 
separated by increasing lags according to the ex- 
pression gap = 2" x 10 ks, with n = 1,2, .., 8 (fig. 
O lower panel); 

Fig. [7] shows the normalized excess variance distribu- 
tion derived from 5000 simulations for the two sampling 



Mean Count Rote = 0.077±0.013 • 
Excess Variance =0.025±0.005 • 

Mean Count Rate = 0.083±0.017 + 
Excess Variance =0.040±0.006 + 




Continuous Sampling 
• Progressiv Sampling 



Time (in seconds) 

Fig. 6. — Simulated AGN lightcurves (black crosses) with the uni- 
form (upper panel) and progressive (lower panel) sampling schemes 
marked by red circles. The figure also reports the mean count rate 
and excess variance measured for the particular simulation over the 
whole lightcurve and over the intervals with uniform and progres- 
sive sampling. 



Median (Progressive) =0.032:™;' 
Median (Uniform) =0.0391° Z 
Expected Median Value 




0.00 0.05 0.10 0.15 

Normalised Excess Variance 

Fig. 7. — Excess variance distribution for N=5000 lightcurves 
simulations of uniform (solid line) and progressive (dotted line) 
sampling: the errors are the 90% upper and lower quartiles of the 
a XNV distribution. The uniform sampling removes the bias, in 
fact the mean of the distribution is in agreement with the expected 
value of the excess variance (red line) equal to the input parameter 
of the simulation (a ~ 20%). For a progressive sampling, the bias 
in the intrinsic variance slightly increases. 
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schemes described above. We observe that the distribu- 
tions are now more symmetric, and closer to Gaussian, 
than was the case for the original XMM pattern (FigfTJ). 
A regular sampling pattern also minimizes the median 
bias (b = 1.01(0.80,1.32)) in the intrinsic variance esti- 
mates with a median <j\ nv consistent with the expected 
value, although the individual measurement still have un- 
certainties of ~ 25%. For the progressive sampling the 
median bias is somewhat larger (b — 1.19(0.88,1.80)). 
as this sampling pattern favors short time scales, while 
the dominant contribution to the total variance is due to 
longer ones. 

Clearly, if we consider a sparsely sampled lightcurve, 
the preferable observing scheme is thus a regular pattern 
with temporal gaps not much longer than the length of 
each observation. In this situation the observations can 
be used to estimate the intrinsic source variance even 
from single observations, although with significant uncer- 
tainties. The progressive sampling may be preferred if we 
intend to trace the whole PSD (as opposed to just the 
variance), but such measurements requires higher S/N 
ratios and repeated measurements to average over the 
intrinsic scatter of any stochastic process. 

5. CONSTRAINS ON THE OBSERVING 
STRATEGY OF FUTURE X-RAY SURVEYS 

Several missions have been proposed over the past few 
years to study high redshift AGNs; most of these are 
designed to have larger effective area than current X- 
ray missions, wider Field-of-View and, depending on the 
planned orbit, lower background. F or instance the Inter - 
national X-ray Observatory (IXO. IBarcons et alj|20rl 
and its evolutio n Athenc^\, the Wid e Field X-ray Tele- 
scope (WFXT, iMurrav e it ail l2010t) . all represent mis- 
sions capable of performing AGN surveys with higher 
speed than Chandra or XMM. The results discussed in 
£14.41 allows to explore the capabilities of such future X- 
ray missions in the time domain. In particular we exam- 
ine the expectations for deep, wide-area surveys, which 
will allow to probe the highest redshift and faintest AGN 
populations at the expense of a continuous temporal cov- 
erage. 

To investigate the capabilities of such missions in mea- 
suring AGN variability, we present here the performance 
of a mission with 1 m 2 effective area, 1 sq.deg. FOV 
and the low background allowed b y a low earth orbit , 
very similar to the WFXT design (|Rosati et al.ll2010f ). 
This results in a la rge number of mode rate and high red- 
shift AGN (see e.g. lPaolillo et alJl2010t l. We used a total 
observing time of ~ 400 ks and we evaluated the per- 
formance that can be expected assuming a uniform sam- 
pling scheme similar to the one presented in £14.41 Fig- 
ures [8] and |9] represent an example of a possible observing 
scheme for the survey, where observations of 50 ks each 
are spread evenly over ~ 6 months and the corresponding 
excesses variance and bias distributions, respectively. 

In order to verify the performance of such type of mis- 
sion for faint AGN populations, we explored the depen- 
dence of the measured excess variance on different values 
of the source mean count rate. The results are sum- 
marized in Table [5j The excess variance remains rela- 
tively small (< 20%) even at the lower count rate levels. 

9 http:/ / www. rape, mpg. de / athena/workshop_ mpe_201 1 /index. php 



Mean Count Rate =0.103±0.015 
Excess Variance (rms) =0.020±0.004 
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Fig. 8. — Simulated AGN lightcurve, sampled in 50 ks observa- 
tions spread uniformly on ~ 6 months, as expected from future 
large effective area mission such as those described in the text. 
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Fig. 9. — Excess variance {upper panel) and bias (lower panel) 
distribution based on a set of 5000 simulated lightcurves, such as 
the one shown in Figure [8] with this observing strategy we are able 
to retrieve the intrinsic variance with an uncertainty of ~ 25%. 

Compared to the case discussed in S}3] however, we are 
now able to detect variability at flux levelj^l more than 
one order of magnitude lower than XMM, using approxi- 

10 Conversion factors from counts to fluxes were calculated as- 
suming a power law spectrum with ct ph = 1.4 for an unabsorbed 
AGN at z=0. 
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TABLE 5 

Median a\ xv and bias as a function of S /N ratio for a future mission described in jj5] 



Mcr 


i' 

N 


Source Flux 


Median cr'jsixs 


b 


cnt/s (cnt/bin) 




(erg s _1 cm -2 ) 






0.1 (1000) 


38 


4 x 10~ 14 


0.037(0.027,0.048) 


1.1(0.9, 1.5) 


0.01 (100) 


9.3 


4 x 10" 15 


0.037(0.026, 0.047) 


1.1(0.9, 1.6) 


0.005 (50) 


7.2 


2 x 10" 15 


0.036(0.024,0.048) 


1.2(0.9, 1.7) 


0.002 (20) 


3.9 


8 x 10" 1G 


0.035(0.015,0.055) 


1.2(0.7,2.8) 


0.001 (10) 


2.7 


4 x 10" 16 


0.033(0.006,0.073) 


1.3(0.6,6.9) 



mately the same observing time, thus allowing variability 
studies for hundreds of AGNs per square degree. Such 
good performances are due in part to the larger effective 
area, and in part to the low background made possible 
by the considered low-earth orbital configuration. 

6. DISCUSSION AND CONCLUSIONS 

In this paper we discussed the performance of current 
and future deep survey X-ray missions in the time do- 
main and their ability to measure AGN variability, using 
realistic simulations that reproduce the real data prop- 
erties. 

We show that the excess variance is a biased estima- 
tor of the intrinsic lightcurve variance in sub-optimal 
observing conditions, such as those characterizing the 
2001-2002 XMM-Newton observation of the CDFS. The 
same bias is observed when using alternative estimators 
of the intrinsic lightc urve variance, as suggested by, e.g., 
lAlmaini et "ail (|2000D . In fact wc find that when the sam- 
pling pattern is very sparse, the intrinsic variance of the 
lightcurve is underestimated, mainly because each real- 
ization badly reproduces the intrinsic mean count rate. 

Due to the red noise nature of the AGN PSD, this bias 
strongly depends on the temporal gaps between observa- 
tions on the longest timescales, while it is less sensitive 
to the detailed distribution of the data points on short 
timescales. Furthermore, for a fixed sampling pattern, 
the bias does not change with the source flux as long as 
the S/N ratio per bin is > 1.5; for lower values we are 
hardly able to detect variability at all, due to the increas- 
ing contribution of Poisson noise to the total variance. 
We then suggest as rule of thumb, to use sources with a 
S/N ratio per bin above 1.5-2, in estimating the intrinsic 
variance for sparse sampled lightcurves. We further ver- 
ified that the bias depends only mildly on the power-law 
PSD index, with a peak for (3 — 1.5, and anyway remains 
below 2 for all slopes tested here. 

While in principle we can use simulations, such as 
those described here, to correct the measured quantities 
and estimate the intrinsic variance, we point out that 
the uncertainties on the bias factor can be very large in 
the case of irregular sampling, and the bias distribution 
is very asymmetrical, so that each individual lightcurve 
yields a very poor estimate of the intrinsic AGN prop- 
erties. On the other hand we showed that binning to- 
gether excess variances in groups of 10, 20 and 50 points, 
produces mean values that are approximately Gaussian 
distributed and its uncertainty can simply be estimated 
from the scatter of the individual points composing each 
bin. These results are irrespective of the power law slope 
/?, the temporal gap, and of the S/N, even if the the 
spread of the mean excess variance distributions increases 



with the gap length and with decreasing S/N. 

Unevenly observing patterns as the ones discussed in 
$3] and 21 are often due to the scheduling requirements of 
deep multi-cycle campaigns; in order to show the benefits 
deriving from a proper observing strategy, we tested two 
regular observing schemes, which allow us to span the 
same maximum timescale as the XMM-Newton observa- 
tion of the CDFS; we find that such schemes significantly 
reduce the bias in the excess variance estimates and pro- 
duce more symmetrical distribution, with uncertainties 
that range from ~ 100% down to ~ 20% for the brightest 
sources. Uniform sampling patterns are those producing 
the best results, although different schemes sampling a 
larger range of timescales may be desirable to derive a 
full PSD. 

Finally we showed that for future X-ray mission, a 
properly designed observing strategy may allow to mea- 
sure variability for hundreds of sources per square de- 
gree. Such dataset wo uld largely overla p with the spec- 
troscopic sample (e.g. iGilli et all 1201 lh . thus resulting 
thousand of AGNs with both temporal and spectroscopic 
informations. Since the individual variance estimates 
will still be affected by significant uncertainties, a large 
dataset will be essential in order to constrain the average 
timing properties of high redshift AGNs (provided that 
the AGN population shares the same intrinsic proper- 
ties). 

Several dedicated timing missions have also been pro- 
posed in the X-ra y regime such as Lobster or LOFT 
(jFeroci et al.l 12010). In such cases the continuous mon- 
itoring ensures a sampling pattern very close to a con- 
tinuous lightcurve yielding unbiased variability estimates 
with small uncertainties, thanks to the possibility to av- 
erage out the scatter intrinsic to any stochastic process. 
This type of analysis however will be possible only for the 
brightest (and mostly nearby) sources due to the small 
angular resolution of such missions. 

We want to stress that the simulations presented here 
do not include additional systematics, such as for in- 
stance vignetting and PSF variation across the FOV. The 
readers are then encouraged to explore their specific sci- 
ence cases using simulations that closely reproduce their 
specific sampling pattern, S/N ratio, background con- 
tamination etc. Furthermore, the observing strategy of 
future missions will likely be decided based on additional 
scientific requirements, such as the need to discover and 
trace transients with variable decay timescales, or to fol- 
low up observations made by observatories at other wave- 
lengths (e.g. LSST), which may require to adopt strate- 
gies that are sub-optimal for AGN studies with respect 
to those discussed here. 
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